Beyond Word Frequency: Bursts, Lulls, and Scaling in the Temporal Distributions of Words

نویسندگان

  • Eduardo G. Altmann
  • Janet B. Pierrehumbert
  • Adilson E. Motter
چکیده

BACKGROUND Zipf's discovery that word frequency distributions obey a power law established parallels between biological and physical processes, and language, laying the groundwork for a complex systems perspective on human communication. More recent research has also identified scaling regularities in the dynamics underlying the successive occurrences of events, suggesting the possibility of similar findings for language as well. METHODOLOGY/PRINCIPAL FINDINGS By considering frequent words in USENET discussion groups and in disparate databases where the language has different levels of formality, here we show that the distributions of distances between successive occurrences of the same word display bursty deviations from a Poisson process and are well characterized by a stretched exponential (Weibull) scaling. The extent of this deviation depends strongly on semantic type -- a measure of the logicality of each word -- and less strongly on frequency. We develop a generative model of this behavior that fully determines the dynamics of word usage. CONCLUSIONS/SIGNIFICANCE Recurrence patterns of words are well described by a stretched exponential distribution of recurrence times, an empirical scaling that cannot be anticipated from Zipf's law. Because the use of words provides a uniquely precise and powerful lens on human thought and activity, our findings also have implications for other overt manifestations of collective human dynamics.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

English Vocabulary for Equine Veterans: How Different from GSL and AWL Words

ESP students are usually suggested to master general and academic word lists such as Wests’ (1953) General Service List (GSL) and Coxhead’s (2000) Academic Word List (AWL) to be able to read their academic texts. However, it seems that university students may not need to learn all the words in the two lists as some words in the lists are of less frequency in academic texts. Moreover, there are ...

متن کامل

Estimating IDF based on daily precipitation using temporal scale model

The intensity –duration –frequency (IDF) curves play most important role in watershed management, flood control and hydraulic design of structures. Conventional method for calculating the IDF curves needs hourly rainfall data in different durations which is not extensively available in many regions. Instead 24-hour precipitation statistics were measured in most rain-gauge stations. In this stud...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

The Effect of Word Meaning on Speech DysFluency in Adults with Developmental Stuttering

Objectives: Stuttering is one of the most prevalent speech and language disorders. Symptomology of stuttering has been surveyed from different aspects such as biological, developmental, environmental, emotional, learning and linguistic. Previous researches in English-speaking people have suggested that some linguistic features such as word meanings may play a role in the frequency of speech non...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2009